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A METHOD TO CLASSIFY CHARACTERS 
OF UNKNOWN ANCIENT SCRIPTS 


SEPPO KOSKENNIEMI, ASKO PARPOLA AND SIMO PARPOLA 


THE PURPOSE 


The present paper has grown out of our efforts to decipher the Indus 
script used in NW India с. 2500-2000 B.C. Since no bilinguals so far 
have been unearthed, the only key to the script is the internal structure of 
the preserved texts, as it was in the case of the Cretan Linear B script that 
M. Ventris succeeded in decoding 15 years ago. 

The method used by Ventris consisted in arranging the easily discerned 
syllabic signs into tables showing their components (C + V) on the ground 
of the textual behaviour of the signs alone and then giving phonetic values 
to the components on the basis of certain clue words (place names). E.g., 
Ventris got a series of signs containing the same consonant but a different 
vowel from words consisting of several signs and occurring with variants 
which differed only in the last sign and which apparently represented 
declined forms of the same word (cf. Latin do-mi-nus, do-mi-ni, do-mi-no). 

In Linear B the total number of different signs is c. 250, of which c. 90 
are used syllabically, the logograms being easily recognizable by means of 
word division: they occur alone and together with numerals in the texts. 
The syllabic signs consist, as Ventris presupposed, of open syllables only 
(type CV, V), and there were long sentences clearly cut into words many 
of which are of considerable length. 

The thousand years older Indus script presents a much more difficult 
problem. One of the great obstacles to its decipherment is the shortness 
and one-sidedness of the preserved materials which consist of c. 2000 
short inscriptions, mostly seals, the average length of which is 5 signs. 
Another difficulty is that the words are not separated from each other. 
This difficulty can, however, be removed to a certain degree, for, as we 
shall show in another paper, the separation is possible to a considerable 
extent; the normal word length varies from one to three signs. The short- 
ness of the words, the number of different signs, approximately 300, and 
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the analogy between other contemporary writing systems shows that 
Indus script must have been of the logo-syllabic type. Hence there must 
have been closed syllables and logograms besides open syllables (types 
CV, V, VC, CVC, WORD), a fact that does not make decipherment easier. 

We have approached the problem i.a., by starting from a selection of 
known ancient scripts. The idea has been to develop a method of analysis 
which would yield the same reasonable result when applied to any sam- 
ples of known scripts and hence also when applied to an unknown script 
sample. 


THE PROBLEM 


Mathematically, the basic problem is to find a classification procedure 
which is based purely on statistical principles and can group the signs of 
written language in such a way that this classification would be constant. 
That is: the result must be independent of language. The only information 
that can be used for the procedure is the different frequencies and other 
statistics obtained from a sample of the written language that might be 
quite an unknown one. No grammatical or phonetical information is 
supposed to be available. 

A good collection of data for classification purposes could comprise, 
for example, the frequencies of different signs, the position of signs within 
words and their occurrence in ligatured signs, and the pairwise appearance 
of signs. 

Should a classification method be found which meets with these require- 
ments it will very possibly have the following feature: applied to known 
languages it gives a grouping which may be very near to some grammat- 
ical or phonetical classifications. 

However, the procedure need not lead to any a priori known classifica- 
tions. The only feature required is that the result be constant. 


THE METHOD 


The input data for this classification procedure is composed only of the 
information obtained by collecting all pairwise frequences of the different 
signs. In other words, the procedure tries to group together signs which 
appear in similar surroundings and avoids grouping signs which, statis- 
tically, very seldom appear next to each other. We shall call the matrix of 
the pairwise frequences by F. 
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Ла Л» ... Ла 
Е = fas faa we 
Ла М» Ја 


where f;; shows how many times the signs called i and j appear next to 
each other (first i and then j). If two signs in this matrix have rows which 
are much alike we can say that these signs have similar right-hand-side- 
surroundings. Correspondingly, two similar columns indicate that the 
left-hand-side-surroundings of the signs are similar. 

In order to eliminate the effect of the single frequencies of the signs 
themselves, we must normalize the binomial distributed random variables 
fi by their means and standard deviations. Assuming that the sample is 
large, we can use the formula 


fu — 
Fur = 
vL ij 





where 
Sij = (5л, e у, Y Sets 
k 1 k 1 


and where the new variables /, ; form the new double-frequency-matrix Ё. 

We now define the distance between two signs by the cross product of 
corresponding rows in the matrix F when the right-hand-side surrounding 
of the signs is concerned, and of columns in the matrix F when we are 
interested in the left-hand-side-surrounding of the signs. We normalize 
these cross products by the means and norms of the row (or column) — 
vectors and subtraction from one. In this way we get the distance matrix 
R which is the only data for the classification procedure. 

We needed a new classification method because the known methods, 
such as factor and discriminant analysis, taxonomy, and so on, lack the 
correct criterion for our purposes. We are not searching for typical signs 
(that kind of classification is done in taxonomy) and we do not want signs 
with a strong negative correlation to appear in the same group (that 
happens when using factor analysis). We found out that a good criterion 
for our purposes is the total average within-groups distance (K) which is 
to be minimized: 


68 SEPPO KOSKENNIEMI, ASKO PARPOLA, SIMO PARPOLA 


where 
N= y De 1) 
k=1 


and n, = members (signs) in k’th class 
m = number of classes 
k = class-index 
I, = set of indexes of sings appearing in k’th class 
N = total number of distances within all classes 


The classification procedure is extremely simple and it is based upon the 
successive moving of signs from one class to another. Every move is to be 
selected from among all possible moves in such a way that the value of 
our criterion K after that step is minimal. This strategy will lead into a 
procedure which of course will converge since the value of K decreases all 
the time, but there is no proof of the optimality of the final classification. 
However, in practice a very good near-to-optimum solution is found. 

Starting from two groups this procedure is then carried out by increas- 
ing the number of groups and considering the solution of every step as an 
initial classification for the next one. This method involves a great deal 
of computing and cannot be carried out without electronic computers. 
It is now programmed in FORTRAN language and a lot of material has 
been run at NEUCC! on IBM 7090. 


THE RESULTS 


The classification program has been applied to samples of five ancient 
scripts, each consisting of 8,000-10,000 signs. In this paper we are going to 
present the results of one sample in extenso in order to show the whole 
process. Of the other samples only the final results are given. It can be 
easily seen that in every case, the elements of the final groups stay together 
during the entire classification process. As a matter of fact the final re- 
sults of basic groups are defined by means of three criteria: 

(1) The frequencies of the elements of each basic group must not be 
small. 

(2) The K-value (distance from the center of the group) must be small 
for every element which belongs to a basic group. 

(3) Each basic group must hold together during the classification proc- 
ess. 


1 Northern Europe University Computing Center, Lyngby, Denmark. 
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In the listings of the results we present the percentage distribution of the 
signs, and the classifications into 2, 3 and so on groups. The symbols of 
the signs have been sorted out into ascending order according to the K- 
values. THIS MEANS THAT THE FIRST SIGNS ARE THE MOST SIGNIFICANT ONES. 


1. Elamite Cuneiform 


Elamite cuneiform of с. 1100 В.С. is a syllabic writing consisting of с. 130 
syllabic signs (types CV, V, VC, CVC) and c. 30 logograms (used mostly 
as determinatives). The material analyzed? comprises the 47 stereotyped 
inscriptions (mostly dedications) from Tchogha-Zanbil published by 
M.-J. Stéve in Iranica Antiqua, ЇЇ (1962), 22-76, and two longer inscrip- 
tions of Sutruk-Nahhunte I and Silhak-InSuSinak from F.W. Kónig, Die 
elamischen Konigsinschriften (—Archiv für Orientforschung, Beiheft 16) 


TABLE 1 


Frequency distribution of Elamite signs 


Symbol % Symbol % Symbol % Symbol % Symbol % 





Total 70 different symbols 
Total 6103 signs 


* Unlike the other samples, this cuneiform has not been coded directly but from the 
transcription; this has caused an inexactitude with regard to the H-sign (thus in Kónig's 
transcription): by Stéve it has been transcribed with (the same) vowel (as in which 
the preceding syllable ends) + H. 
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(Graz, 1956), nos. 28 A and 45. All the 70 syllabic signs occurring in the 
sample have been included in the first two classifications according to the 


TABLE 2 


Classification of the Elamite syllables according to the text 
following the signs 


Two groups Three groups Four groups Five groups 


I п ш 
LA AT TI 


TAS TU IR 
IA LU KI 
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adjacent signs on each side (Tables 2 and 3). Then follows a classification 
which is based on these two analyses, but carried out anew with fewer 
variables, only the forty most frequent signs being included (Table 4). 


TABLE 3 


Classification of the Elamite syllables according to the text 
preceding the signs 


Two groups Three groups Four groups 
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TABLE 4 


Final classification of the Elamite syllables according to the text on 
both sides of the signs (only 40 most frequent signs) 


Two groups Three groups Four groups 





Six groups (these groups are 
Five groups no more relevant) 
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2. Cretan Linear B Script 


Cretan Linear B script is archaic Greek of c. 1500 B.C. The (logo-)syllabic 
writing consists of c. 250 signs of which c. 90 have syllabic values (types 
CV and V alone). The material analyzed comprises the texts nos. 1-45; 
47-103,4; 134-136,2 in Tabellae Mycenenses selectae (— Textus minores, 
28), ed. by C. J. Ruijgh (Leiden, 1962). 


TABLE 5 


Relative frequencies of the Linear B signs 


Symbol % Symbol 25 Symbol 26 
RO 4.44 DE 1.64 DI 0.56 
TO 4.20 U 1.62 ZE 0.52 
KO 3.88 PE 1.59 MI 0.45 
E 3.86 MA 1.59 ZA 0.45 
JO 3.66 RI 1.55 А» 0.43 
O 3.55 PO 1.30 QI 0.40 
A 3.46 NE 1.28 NU 0.38 
JA 3.41 SO 1.26 JE 0.36 
RE 3.23 KU 1.26 QA 0.36 
TE 3.12 WA 1.23 TU 0.36 
TA 3.05 TI 1.21 SI, 0.25 
RA 2.96 DA 1.19 As 0.25 
KE 2,96 I 1.05 ZO 0.20 
PI 2.76 RU 0.99 SU 0.16 
WE 2.74 WI 0.99 PTE 0.16 
WO 2.69 DO 0.99 RA; 0.16 
PA 2.58 KI 0.92 RA, 0.13 
NO 1.97 MO 0.90 PU, 0.13 
ME 1.93 QO 0.85 DU 0.11 
KA 1.86 PU 0.79 ТА, 0.09 
SI 1.84 NI 0.74 RO, 0.09 


74 SEPPO KOSKENNIEMI, ASKO PARPOLA, SIMO PARPOLA 
TABLE 6 


Classification of Cretan Linear B Script 


According to the text According to the text According to the text 
following the signs preceding the signs* on both sides of the signs* 





* 40 most frequent signs only were included. 


3. Neo-Assyrian Cuneiform 


Neo-Assyrian cuneiform of c. 700 B.C. is a logo-syllabic writing consisting 
of c. 500 signs of which c. 200 have a syllabic value (types CV, V, VC, CVC, 
VCV). Very many signs are used now syllabically, now as logograms; 
besides, one and the same sign can have different syllabic values. All this 
naturally hampers the classification procedure. The analyzed material 
comprises 43 letters (R. F. Harper, Assyrian and Babylonian Letters 
[Chicago, 1892-1914], nos. 1-2, 78, 85, 88, 90-1, 101, 126, 131, 134, 138, 
140, 142, 144, 152, 154, 157, 167-8, 175, 178, 181, 386-7, 391, 406, 408, 
410-1, 413-5, 419-421, 423, 531, 537-8, 541, 544, 561). The signs in table 7 
(altogether 6948 out of 9089) have been included in the classification in 
table 8. The logographic values given are those occurring in these texts. 


CHARACTERS OF UNKNOWN ANCIENT SCRIPTS 


TABLE 7 


Relative frequencies of Neo-Assyrian cuneiform 


Syllabic value? 


A 
NI (LÍ, S/ZAL) 
NA 


AN 
INA 


LA 

BE, BAD/T, TIL 

TU, UD/T, PAR (PIR, HIS, LAH) 
MU 

BU, PU 

LU, DIB/P 

LI 

KA 

E 

PA (HAT) 


MAN, NIS 
SU (QAT) 
TE 

ME, SIB/P 


DA, TA 


Rare values in parentheses 


Logographic value 





(‘water’) 
(‘oil’) 


‘king’ (1) 
‘god? 
‘in’? 


(‘statue’) 
(‘plant’) 
(‘totality’) 

sign of pl. 

‘one’; sign of PN 
‘with’ 

‘lord’ 


10 
‘man’ 

many values 
‘day’, etc. 
‘name’ 
(‘long’) 
(‘sheep’) 


(‘mouth’) 


a god, etc. 


(‘to sit’, etc.) 
‘face’ 

‘earth’ 

‘to cast’ 

‘to place’ 
‘house’ 
(‘horn’) 


‘good’ 


‘king (2); 20° 
‘hand’ 

‘to approach’ 
(sign of pl.) 
‘heart’ 


76 SEPPO KOSKENNIEMI, ASKO PARPOLA, SIMO PARPOLA 


Syllabic value Logographic value 


(MUH) ‘crown of the head’ 
‘servant’ 


UR, TAS, LIK/Q 
TI (‘to live’) 
IS, (MIL) 


(‘to present’) 
(‘to stand, to go’) 
(‘to compensate’) 
‘wood’ 
‘to enter’ 
KUR, MAT (SAT, LAT) ‘country’ 
AD/T/T ‘father’ 
ID/T/T (‘arm’) 
AS 


(GAL) ‘big’ 
UN ‘people’ 
[MA 





TABLE 8 
Classification of the Neo-Assyrian cuneiform 


According to the text following the signs 


SERVANT 
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According to the text preceding the signs 


1 
SERVANT 





According to the text on both sides of the signs 
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4. Middle-Egyptian Hieroglyphs 


Middle-Egyptian hieroglyphs date c. 2000-1500 B.C. The logo-syllabo- 
alphabetic writing consists of c. 700 signs of which c. 100 have a phonetic 
value (types С, CC, CCC). The material has been taken from A. Н. Gar- 
diner’s Egyptian Grammar, 2nd edition (London, 1950), passim. Only 
26 alphabetic signs (60% of the sample) were included as variables in the 
classification. In examining the groups which, unlike the preceding 
samples, do not seem to follow any phonetic principle, it must be borne in 
mind that we do not know the vowels inherent in the consonant signs 
which were written alone. 














TABLE 9 TABLE 10 
Relative frequencies Classification according to the text 
following the signs 
Symbol yA 

N 10.13 I II Ш IV X 

T 9.97 Q T C W I 

R 5.48 у 5 H Ye 

F 3.79 F P R B H 

M 3.79 G S Yi 5 D 

WwW 3.63 M T 3 

i 3.28 H D 

$ 2.63 K 

C 1.99 

K 1.80 Classification according to the text 

P 1.64 preceding the signs 

D 1.47 

H 1.47 I п ш IV V 

А к Е уйй $ P 

Y 1.01 H M H к S 

2 А 

р 0.98 T Q 3 С Ї 

S 0.86 D N B G Y 

B 0.78 = н Y, D 

T 0.75 

Y, 071 H 

Q 0.50 

& 0.46 Classification according to the text 

H 0.40 on both sides of the signs 

G 0.26 

H 0.18 I II Til IV V 
F M P Š Ś 
H Ww T 3 C 
K Q S H H 
H N i B D 
T R Y. У, 
G D 
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5. Sumerian Cuneiform 


Sumerian cuneiform of c. 2500 and 2100 B.C. is a logo-syllabic writing 
consisting of c. 700 signs of which c. 70 are constantly used as phonetic 
signs to indicate grammatical elements. We have coded a selection of 
Old Sumerian Royal Inscriptions (Edmond Sollberger, Corpus des in- 
scriptions ‘royales’ présargoniques de Lagas [Genéve, 1956], pp. 37-38, 
50-53) and the cylinder inscription of Gudea (F. Thureau-Dangin, Les 
cylindres de Gudea (=Textes cunéiformes du Musée du Louvre, ҰШ) 
[Paris, 1925]). The classification was first done with half of the last men- 
tioned text only (Gudea A), then with the whole material. We present 
both results. 


TABLE 11 
Absolute and relative frequencies of the Sumerian cuneiform 


Whole material Gudea-A only 


Symbol Freq. 
228 
226 
172 
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Whole material Gudea-A only 


Symbol Freq. 


Symbol 
NAM 
U 


Others 624! 





Others 254 
TOTAL 13176 TOTAL 5557 
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TABLE 12 


Classification of the Sumerian cuneiform 


Whole material Gudea-A only 


According to the text following According to the text following 
the signs the signs 
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According to the text preceding According to the text preceding 
the signs the signs 





According to the text on both According to the text on both 
sides of the signs sides of the signs 
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TABLE 13 


The distribution of the signs according to their vowel components 
in the classifications into four groups? 


Classification according to the text following the signs 
Sample Group Type of the sign 
V СА CU CO CI CE AC UC OC IC EC CVC Rest Total 





Elam- I 3 1 1 1 1 
ie П 0 0 6— 00.4 3— 2 1 1 1 18 
n 0. 0 9 —di 2 0 2 — 3$ 9 0 0 18 
IV | Ei 3 4— 00 1 1— 11 0 0 B 
Linear Т | АА 5 4 2 13 — — ——— m — — 1% 
B I 1” $29- db do AB, ee hum erem den гыз) Lud 
Ш MEC NE A E Se шшш ы ee дшш; А] 
IV [А/А/О 5 1 3 0 4— — ——— — — 16 
Neo- I E 012 — 00 1 1 — 10 1 1 18 
Assyr- П yU 8 1 — 0 0 0 0 — 10 2 2 16 
in Ш 0 1 0 — 8 2 1 0—11 1 1 16 
IV | AU 3 0 — 11 11 11—002 2 B 
Sumer- I EA 9 4 — 1330—00 2 0 A 
in П U 1 3— 02 0 1—10 3 0 12 
Ш о 2 1— 43 2 о— 1 0 0 0 B 
IV 0 1 3 — 3 1 0 0— 10 0 о 9 





а М.В. The vowels inherent in the signs of Egyptian hieroglyphic writing being 
unknown, this sample has not been included. 


Classification according to the text preceding the signs 
Sample Group Type of the sign 
V СА CU CO CI CE AC UC OC ІС EC CVC Rest Total 














Elam- I E/I 3 0 — 1100—82 0 0 17 
ite П А 3 2 — 5 0 7 0 — 0 0 2 0 19 
ПІ 0 2 6 — 20 0 7 — 00 0 1 18 
IV U 8 3 — 3 10 0 — 00 0 0 16 
Linear I ША 4 0 13 i= ~ ——— — — ll 
B п I 2 1 214—————— И 
IIT O/E 1 112 3-————-——— — 10 
IV 0 2.0 5 01—————— — 8 
Neo- I 0 8 2 2 0 1 0 — 01 2 3 19 
Assyr- II БЛ 3 3 — 1 1 0 0 — 3 0 4 1 18 
ian IH A 2 0 — 402 0 — 00 0 2 11 
IV UU 0 7 — 22 02— 00 0 0 15 
Sumer- I ОАЕ 4 6 — 2 4 1 0 — 10 1| 0 22 
ian П 4 1 — 13 2 0 — 0 0 2 0 13 
ш 0 4 2 — 3 0 0O 1 — 2 0 0 0 12 
IV 0 1 2 — 222 0 — 00 2 0 Ц 
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Classification according to the text on both sides of the signs 
Sample Group Type of the sign 


V СА CU CO CI CE AC UC OC IC EC CVC Rest Total 
Elam- I A 8 0 — 000 0 — 00 0 0 9 
ite i 0 0 0 — 82 1 0 — 00 0 0 п 
ш U 0 3— 100 0 — 60 0 0 1 
IV 0 2 2 — 101 3— 0 0 0 0 9 
Linear I ПОА 0 0 15 3———— — — — 12 
B П EU 4 1 101———— — — — 9 
ш 0 2 1 312———— — == — 9 
IV 0 3 0 4 03———— — — — 10 
Neo- I EIU 011 — 0 1 0 O0 — 3.0 1 0 19 
Assyr- П U 2 1 — 12 1 1-— 01 3 3 16 
ian ш 0 8 1 — 301 0 — 00 2 2 17 
IV A 1 0 — 5 0 1 1 — 00 0 1 10 
Sumer- I АЕО 4 5 — 2 4 1 0 — 10 1 0 21 
ian п 0 4 2 — 123 0—00 4 0 16 
(whole III 0 4 1 — 22 0 0 — 2 0 O0 0 ә 1 
mate- IV 0 2 3 — 3 1 1 1 00 0 0 11 


rial) 
* 10 





CONCLUSIONS 


The classification of the Elamite and Neo-Assyrian signs has given 
results that can be considered excellent. It appears that the vowels, 
especially I/E and U, play a predominant part, and that the groups 
classified according to the text following the signs generally consist of 
CV signs and those classified according to the text preceding the signs 
consist of VC signs. In some cases the classification method also reveals 
the actual pronunciation of a sign with several values: thus it seems fairly 
certain that, e.g., the Neo-Assyrian sign JA/JI/JU is to be read, in most 
cases, JU and not JA as has been done hitherto; likewise apparently TU 
and not UD. 

Of the classification of the hieroglyphs we can draw only the conclusion 
that the consonants do not affect the grouping. The egyptologists are 
invited to consider the above given groups with regard to the unascer- 
tainable inherent vowels, 

In the sample of Linear B, only the group containing syllables with the 
vowel I/E has clearly come out. It is possible that the consonants have 


affected the grouping here. The results of the Sumerian sample do not 
make much sense. 
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Hence we can conclude that the method is very effective when VC signs 
and CV signs are equally well represented in the script analyzed and when 
it is possible to eliminate most of the logograms. Itis, however, not wholly 
useless in less favourable circumstances as is shown by the Linear B 
script which contains open syllables only, and it should be worth while to 
try it, e.g., also in the decipherment of Linear A. 

We have learnt that the method is not enough for the successful 
decipherment of an unknown script. But the very fact that it is possible 
to attain useful results by mechanical means shows that we are on the 
right way. This line should now be continued and new methods based on 
other criteria be developed alongside the one now described until the 
same combined will yield the solution. 


APPENDIX 


On the classification of the phonemes of a language 


The classification of the phonemes of a language using as a criterion their 
ability to combine with the other phonemes in the speech chain was sug- 
gested already by E. Sapir. Thinking that our program might prove useful 
in this field, too, we have tentatively applied it to samples of modern 
Finnish (from Mika Waltari’s Kuun maisema), English (Agatha Christie's 
Murder in Mesopotamia) and French (Albert Camus’ La peste)? and two 
samples of Latin (Tacitus’ Annales, I, 1-3 and III, 1-3), comprising 3200 
to 5500 letters each. It appeared that the material was insufficient for 
an effective statistical analysis (the two samples of Latin did not give 
identical results) and hence for a finer classification, and that the inexact- 
itude of the conventional orthography gives some trouble (cf. the good 
results of Finnish representing an almost phonetic transcription with the 
results of the other languages). But that the vowels were forthwith sepa- 
rated from the consonants in all the samples gives a promise that the 
results can be better if these shortcomings are annulled. 


* From the first pages. 
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Classification of the letters in a sample of Finnish 


Relative frequency Classification according to the text following the letters 
of Finnish letters 


2 groups 3 groups 


Symbol y^ 





сонаты 


I 
Р 
K 
J 
V 
G 
M 
T 
D 
N 
H 


AHUmguwzZzdcuUuz«am"u- 





бк чвонманошинны 


4 groups 5 groups 


zUdHgQ-«mWu|- 





H II IV V VI УП VIII 





I 
V A 
J I 
M E 
P U 
K 
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Classification according to the text on both sides of the letters 


4 groups 


3 groups 


2 groups 


п IH IV 


= 


20 о 
ша < Ps O м 
БМ ШЕ мо 





£d 
РЕЪДМЩЫНОоВ 





6 groups 


5 groups 


П IH IV V 


I 





-00 
ФАБ а 


Zu 
Q C m о< O 
me > Moe 





8 groups 


7 groups 


u WIV V VI УП УШ 


I 


5 
> 
> 
E 
H 


I 





SEPPO KOSKENNIEMI, ASKO PARPOLA, SIMO PARPOLA 


Classification of the letters in Sample I of Latin 


Relative frequency Classification according to the text 
"of the letters following the letters 


2 groups 3 groups 





11.77 

10.75 
8.84 
8.50 
7.61 
7.03 
6.62 
6.04 
5.94 
5.77 
3.99 Classification according to the text 
3.86 preceding the letters 

3.62 

2.56 2 groups 3 groups 
1.91 
1.30 
1.19 
1.16 
0.65 
0.61 
0.31 





I 
E 
U 
A 
T 
5 
R 
M 
N 
О 
С 
L 
P 
D 
B 
G 
у 
Q 
F 
X 
H 








Classification according to the text 
on both sides of the letters 


2 groups 3 groups 


I 
D 
C 
H 
T 
F 
У 
G 
R 
L 
M 
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Classification of the letters in Sample II of Latin 


Relative frequency Classification according to the text 
of the letters following the letters 














Symbol % 2 groups 3 groups 
I 11.60 I I I IH IH 
E 10.52 RF O V от 
A 9.53 ide ТЕ i: BR 
U 8.09 SNB AA DE- vM 
T 7.91 DS U R UF 
R 7.55 TX І ne MC 
5 6.88 HN G x Q 
N 6.34 C P Q S G 
M 5.93 N 
о 5.35 С 
C 4.00 
L 3.42 
P 3.01 
D 2.38 Classification according to the text 
B 1.71 preceding the letters 
Q 1.53 2 groups 3 groups 
G 1.44 
v 1.26 
F 1.03 
X 0.31 
H 0.22 





reozOowzg|- 
ч OUROX 





cCEoxmovwzzg|- 


Classification according to the text 
on both sides of the letters 


2 groups 3 groups 


=ч 
= 


SNOANA 


I 
DX 
HS 
LcC 
RT 
BG 
NF 
MP 
V 
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Classification of the letters in a sample of French 


Relative frequency Classification according to the text 
of the letters following the letters 


E 
5 
N 
A 
T 
L 
I 
U 
R 
О 
р 
м 
C 
P 
У 
Q 
B 
F 
G 
H 
Y 
J 
X 





2 groups 3 groups 


п 


«хос» OR 





Classification according to the text 
preceding the letters 


2 groups 3 groups 








I II I п ш 
VD О VR O Y 
GB A TS E B 
YT E XL A N 
NS H G HM 
XC I C Q P 
RL Q D I 
мо J U 
F F 
Р 


Classification according to the text 
on both sides of the letters 








2 groups 3 groups 
I п 
VS о 
GM M 
DT E 
RX О 
LP I 

BC H 
J F U 
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Classification of the letters їп a sample of English 


Relative frequency Classification according to 
of the letters the text on both sides of the 
letters 


Symbol % 2 groups 





айы Йө, E 
mu 


E 
T 
A 
О 
$ 
1 
N 
R 
H 
D 
L 
U I 
С М Е 
Р 

M B J 
У В 7 
F pd 
a ум 
> Ww P 
B 

ү 

K 

J 

Q 

7, 





POSTSCRIPT 


This paper was completed in October 1968. The code of the Indus script 
was broken in January 1969. The three preliminary reports published so 
far (July 1970) have been summarized by Asko Parpola in a paper en- 
titled, “The Indus Script Decipherment: The Situation at the End of 
1969”, in the Journal of Tamil Studies (Madras), ЇЇ, 1, (April, 1970), where 
further references may be found. 





